The aim of this report is to verify factors influencing final outcome of COVID19 disease. Correlations counted in Chapter 5 verify the theory about influence of 3 blood parameters mentioned in the article. Chapter 6 describes a trial of creating a prediction model basing on mentioned parameters.
In this report following R libraries have been used:
library(openxlsx)
library(ggplot2)
library(dplyr)
library(plotly)
library(corrplot)
library(caret)
library(summarytools)
library(backports)
library(skimr)
FileName <- "wuhan_blood_sample_data_Jan_Feb_2020.xlsx"
RawDataDF <- data.frame(openxlsx::read.xlsx(FileName, sheet = "Sheet1", colNames = TRUE, fillMergedCells = TRUE) )
knitr::kable(file.info(FileName))
| size | isdir | mode | mtime | ctime | atime | exe | |
|---|---|---|---|---|---|---|---|
| wuhan_blood_sample_data_Jan_Feb_2020.xlsx | 634526 | FALSE | 666 | 2021-05-10 15:31:09 | 2021-05-13 11:07:23 | 2021-06-05 22:58:14 | no |
ClearedDataDF <- mutate(RawDataDF, RE_DATE = openxlsx::convertToDateTime(RE_DATE), Admission.time=openxlsx::convertToDateTime(Admission.time),Discharge.time = openxlsx::convertToDateTime(Discharge.time))
GroupedByPatientDF <- aggregate(ClearedDataDF[, -c(1, 2, 5, 6)], by = list(ClearedDataDF$PATIENT_ID), FUN = mean, na.rm = TRUE) %>% rename(PATIENT_ID = Group.1) %>% arrange(PATIENT_ID)
GroupedByPatientDF["outcome"][GroupedByPatientDF["outcome"] == 1] <- "died"
GroupedByPatientDF["outcome"][GroupedByPatientDF["outcome"] == 0] <- "survived"
skim(GroupedByPatientDF)
| Name | GroupedByPatientDF |
| Number of rows | 375 |
| Number of columns | 78 |
| _______________________ | |
| Column type frequency: | |
| character | 1 |
| numeric | 77 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| outcome | 0 | 1 | 4 | 8 | 0 | 2 | 0 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| PATIENT_ID | 0 | 1.00 | 188.00 | 108.40 | 1.00 | 94.50 | 188.00 | 281.50 | 375.00 | ▇▇▇▇▇ |
| age | 0 | 1.00 | 58.83 | 16.46 | 18.00 | 46.00 | 62.00 | 70.00 | 95.00 | ▂▃▇▇▂ |
| gender | 0 | 1.00 | 1.40 | 0.49 | 1.00 | 1.00 | 1.00 | 2.00 | 2.00 | ▇▁▁▁▆ |
| Hypersensitive.cardiac.troponinI | 88 | 0.77 | 765.96 | 4273.16 | 1.90 | 2.56 | 13.00 | 75.97 | 50000.00 | ▇▁▁▁▁ |
| hemoglobin | 19 | 0.95 | 125.22 | 18.79 | 61.80 | 113.88 | 125.76 | 138.00 | 178.00 | ▁▂▇▅▁ |
| Serum.chloride | 21 | 0.94 | 102.41 | 6.58 | 74.60 | 99.00 | 101.80 | 104.58 | 140.20 | ▁▆▇▁▁ |
| Prothrombin.time | 23 | 0.94 | 15.61 | 5.08 | 11.50 | 13.50 | 14.31 | 16.02 | 83.35 | ▇▁▁▁▁ |
| procalcitonin | 62 | 0.83 | 0.88 | 3.21 | 0.02 | 0.04 | 0.10 | 0.44 | 29.24 | ▇▁▁▁▁ |
| eosinophils… | 19 | 0.95 | 0.68 | 0.94 | 0.00 | 0.00 | 0.25 | 0.98 | 5.55 | ▇▂▁▁▁ |
| Interleukin.2.receptor | 159 | 0.58 | 934.60 | 807.08 | 65.50 | 465.25 | 693.50 | 1172.50 | 7500.00 | ▇▁▁▁▁ |
| Alkaline.phosphatase | 19 | 0.95 | 82.27 | 48.51 | 17.00 | 54.62 | 69.29 | 92.40 | 620.00 | ▇▁▁▁▁ |
| albumin | 19 | 0.95 | 33.12 | 5.45 | 18.55 | 29.10 | 33.15 | 37.11 | 46.30 | ▂▆▇▇▂ |
| basophil… | 19 | 0.95 | 0.22 | 0.19 | 0.00 | 0.10 | 0.20 | 0.30 | 1.70 | ▇▂▁▁▁ |
| Interleukin.10 | 160 | 0.57 | 13.92 | 51.46 | 5.00 | 5.00 | 6.20 | 12.25 | 750.00 | ▇▁▁▁▁ |
| Total.bilirubin | 19 | 0.95 | 15.33 | 27.45 | 2.75 | 7.40 | 10.59 | 14.80 | 390.85 | ▇▁▁▁▁ |
| Platelet.count | 19 | 0.95 | 195.17 | 93.61 | -1.00 | 131.01 | 197.00 | 249.84 | 554.00 | ▃▇▅▁▁ |
| monocytes… | 19 | 0.95 | 6.62 | 3.93 | 0.50 | 3.59 | 6.48 | 9.00 | 35.20 | ▇▅▁▁▁ |
| antithrombin | 173 | 0.54 | 87.82 | 16.14 | 42.00 | 77.00 | 88.00 | 98.00 | 130.00 | ▁▅▇▅▂ |
| Interleukin.8 | 159 | 0.58 | 67.05 | 312.97 | 5.00 | 9.20 | 16.60 | 35.40 | 3409.00 | ▇▁▁▁▁ |
| indirect.bilirubin | 20 | 0.95 | 6.73 | 6.82 | 1.10 | 4.03 | 5.50 | 7.60 | 102.40 | ▇▁▁▁▁ |
| Red.blood.cell.distribution.width | 25 | 0.93 | 13.00 | 1.64 | 10.70 | 12.00 | 12.54 | 13.60 | 25.60 | ▇▂▁▁▁ |
| neutrophils… | 19 | 0.95 | 75.75 | 16.00 | 1.80 | 63.29 | 76.40 | 90.70 | 98.80 | ▁▁▃▆▇ |
| total.protein | 19 | 0.95 | 66.33 | 5.99 | 47.20 | 62.54 | 66.72 | 70.87 | 81.50 | ▁▂▇▇▁ |
| Quantification.of.Treponema.pallidum.antibodies | 100 | 0.73 | 0.13 | 0.78 | 0.02 | 0.04 | 0.05 | 0.07 | 11.95 | ▇▁▁▁▁ |
| Prothrombin.activity | 23 | 0.94 | 82.57 | 19.75 | 25.00 | 69.00 | 86.00 | 96.00 | 142.00 | ▁▃▇▅▁ |
| HBsAg | 100 | 0.73 | 8.43 | 43.23 | 0.00 | 0.00 | 0.01 | 0.01 | 250.00 | ▇▁▁▁▁ |
| mean.corpuscular.volume | 19 | 0.95 | 89.90 | 6.10 | 61.95 | 86.70 | 89.94 | 93.11 | 116.15 | ▁▁▇▂▁ |
| hematocrit | 19 | 0.95 | 36.86 | 5.06 | 17.56 | 33.88 | 36.77 | 39.88 | 52.30 | ▁▂▇▅▁ |
| White.blood.cell.count | 18 | 0.95 | 13.14 | 26.35 | 0.72 | 5.36 | 7.78 | 13.15 | 370.93 | ▇▁▁▁▁ |
| Tumor.necrosis.factor.U.03B1. | 159 | 0.58 | 11.62 | 10.99 | 4.00 | 6.90 | 8.43 | 11.60 | 103.55 | ▇▁▁▁▁ |
| mean.corpuscular.hemoglobin.concentration | 19 | 0.95 | 343.66 | 16.13 | 299.00 | 335.25 | 343.33 | 351.00 | 488.00 | ▃▇▁▁▁ |
| fibrinogen | 77 | 0.79 | 4.44 | 1.55 | 0.55 | 3.40 | 4.36 | 5.41 | 8.95 | ▂▆▇▅▁ |
| Interleukin.1ß | 159 | 0.58 | 6.57 | 7.06 | 5.00 | 5.00 | 5.00 | 5.00 | 88.50 | ▇▁▁▁▁ |
| Urea | 19 | 0.95 | 8.62 | 8.11 | 1.70 | 3.85 | 5.43 | 9.95 | 58.10 | ▇▁▁▁▁ |
| lymphocyte.count | 19 | 0.95 | 1.11 | 2.29 | 0.03 | 0.55 | 0.91 | 1.37 | 43.06 | ▇▁▁▁▁ |
| PH.value | 143 | 0.62 | 6.45 | 0.63 | 5.00 | 6.00 | 6.50 | 6.96 | 7.57 | ▃▇▇▇▅ |
| Red.blood.cell.count | 18 | 0.95 | 7.91 | 16.97 | 1.85 | 3.87 | 4.44 | 5.38 | 252.25 | ▇▁▁▁▁ |
| Eosinophil.count | 19 | 0.95 | 0.04 | 0.05 | 0.00 | 0.00 | 0.02 | 0.06 | 0.38 | ▇▂▁▁▁ |
| Corrected.calcium | 22 | 0.94 | 2.34 | 0.12 | 2.07 | 2.26 | 2.36 | 2.43 | 2.79 | ▃▆▇▁▁ |
| Serum.potassium | 21 | 0.94 | 4.41 | 0.63 | 3.13 | 4.02 | 4.34 | 4.67 | 6.86 | ▂▇▃▁▁ |
| glucose | 24 | 0.94 | 8.43 | 4.18 | 1.00 | 5.63 | 6.99 | 9.53 | 28.98 | ▇▇▂▁▁ |
| neutrophils.count | 19 | 0.95 | 7.38 | 5.38 | 0.32 | 3.30 | 5.47 | 10.70 | 26.80 | ▇▃▃▁▁ |
| Direct.bilirubin | 19 | 0.95 | 8.63 | 21.39 | 1.60 | 3.20 | 4.62 | 7.18 | 288.45 | ▇▁▁▁▁ |
| Mean.platelet.volume | 29 | 0.92 | 10.89 | 0.99 | 8.50 | 10.15 | 10.80 | 11.45 | 14.20 | ▂▇▇▂▁ |
| ferritin | 162 | 0.57 | 1488.11 | 3860.07 | 17.80 | 409.80 | 753.00 | 1418.30 | 50000.00 | ▇▁▁▁▁ |
| RBC.distribution.width.SD | 25 | 0.93 | 41.96 | 6.02 | 31.30 | 38.42 | 40.62 | 43.80 | 100.10 | ▇▂▁▁▁ |
| Thrombin.time | 77 | 0.79 | 17.63 | 5.31 | 13.60 | 15.80 | 16.64 | 18.00 | 92.75 | ▇▁▁▁▁ |
| X…lymphocyte | 19 | 0.95 | 16.69 | 12.26 | 0.15 | 5.11 | 14.70 | 26.33 | 52.35 | ▇▅▅▂▁ |
| HCV.antibody.quantification | 100 | 0.73 | 0.11 | 0.22 | 0.02 | 0.04 | 0.06 | 0.09 | 2.09 | ▇▁▁▁▁ |
| D.D.dimer | 33 | 0.91 | 6.20 | 8.06 | 0.21 | 0.51 | 1.35 | 12.03 | 40.50 | ▇▁▂▁▁ |
| Total.cholesterol | 19 | 0.95 | 3.67 | 0.87 | 1.00 | 2.98 | 3.64 | 4.23 | 6.16 | ▁▅▇▅▁ |
| aspartate.aminotransferase | 19 | 0.95 | 47.99 | 93.92 | 7.67 | 20.50 | 29.00 | 43.00 | 959.50 | ▇▁▁▁▁ |
| Uric.acid | 19 | 0.95 | 281.11 | 135.76 | 84.20 | 200.38 | 245.73 | 328.33 | 993.00 | ▇▅▁▁▁ |
| HCO3. | 19 | 0.95 | 22.86 | 3.67 | 10.00 | 21.25 | 23.52 | 25.38 | 30.07 | ▁▂▃▇▃ |
| calcium | 21 | 0.94 | 2.10 | 0.13 | 1.78 | 2.01 | 2.11 | 2.20 | 2.58 | ▂▇▇▂▁ |
| Amino.terminal.brain.natriuretic.peptide.precursor.NT.proBNP. | 108 | 0.71 | 2688.23 | 7776.71 | 5.00 | 66.25 | 318.00 | 1548.75 | 70000.00 | ▇▁▁▁▁ |
| Lactate.dehydrogenase | 19 | 0.95 | 462.10 | 343.36 | 116.00 | 224.00 | 306.43 | 610.25 | 1867.00 | ▇▂▁▁▁ |
| platelet.large.cell.ratio | 29 | 0.92 | 31.62 | 7.88 | 11.20 | 25.77 | 30.91 | 36.38 | 58.60 | ▂▇▇▂▁ |
| Interleukin.6 | 157 | 0.58 | 107.40 | 438.82 | 1.50 | 5.97 | 21.09 | 61.94 | 5000.00 | ▇▁▁▁▁ |
| Fibrin.degradation.products | 173 | 0.54 | 46.83 | 58.14 | 4.00 | 4.00 | 5.70 | 94.19 | 190.80 | ▇▁▁▂▁ |
| monocytes.count | 19 | 0.95 | 0.56 | 1.95 | 0.03 | 0.32 | 0.42 | 0.54 | 36.94 | ▇▁▁▁▁ |
| PLT.distribution.width | 29 | 0.92 | 13.00 | 2.58 | 8.20 | 11.15 | 12.60 | 14.16 | 24.20 | ▅▇▂▁▁ |
| globulin | 19 | 0.95 | 33.20 | 4.88 | 18.50 | 30.29 | 32.98 | 36.25 | 48.20 | ▁▃▇▃▁ |
| X.U.03B3..glutamyl.transpeptidase | 19 | 0.95 | 50.78 | 63.92 | 7.00 | 21.31 | 33.20 | 52.62 | 732.00 | ▇▁▁▁▁ |
| International.standard.ratio | 23 | 0.94 | 1.23 | 0.46 | 0.84 | 1.03 | 1.10 | 1.28 | 5.45 | ▇▁▁▁▁ |
| basophil.count… | 19 | 0.95 | 0.02 | 0.01 | 0.00 | 0.01 | 0.01 | 0.02 | 0.10 | ▇▂▁▁▁ |
| X2019.nCoV.nucleic.acid.detection | 157 | 0.58 | -1.00 | 0.00 | -1.00 | -1.00 | -1.00 | -1.00 | -1.00 | ▁▁▇▁▁ |
| mean.corpuscular.hemoglobin | 19 | 0.95 | 30.91 | 2.78 | 20.80 | 29.75 | 30.87 | 32.10 | 50.80 | ▁▇▂▁▁ |
| Activation.of.partial.thromboplastin.time | 77 | 0.79 | 40.91 | 7.47 | 21.80 | 36.41 | 39.50 | 43.77 | 102.75 | ▆▇▁▁▁ |
| High.sensitivity.C.reactive.protein | 22 | 0.94 | 70.41 | 71.10 | 0.10 | 10.20 | 49.00 | 118.10 | 320.00 | ▇▃▂▁▁ |
| HIV.antibody.quantification | 101 | 0.73 | 0.10 | 0.04 | 0.05 | 0.07 | 0.09 | 0.11 | 0.27 | ▇▃▂▁▁ |
| serum.sodium | 21 | 0.94 | 140.74 | 6.11 | 119.10 | 138.01 | 139.94 | 142.55 | 179.60 | ▁▇▂▁▁ |
| thrombocytocrit | 29 | 0.92 | 0.21 | 0.08 | 0.01 | 0.16 | 0.22 | 0.26 | 0.51 | ▂▇▇▂▁ |
| ESR | 87 | 0.77 | 33.87 | 24.12 | 1.00 | 15.00 | 30.00 | 47.00 | 106.00 | ▇▆▃▂▁ |
| glutamic.pyruvic.transaminase | 19 | 0.95 | 38.71 | 75.21 | 5.00 | 17.48 | 25.00 | 40.62 | 1061.00 | ▇▁▁▁▁ |
| eGFR | 19 | 0.95 | 84.04 | 29.77 | 2.15 | 67.57 | 89.38 | 104.71 | 215.45 | ▂▆▇▁▁ |
| creatinine | 19 | 0.95 | 106.30 | 136.83 | 12.50 | 58.00 | 75.49 | 96.19 | 1430.00 | ▇▁▁▁▁ |
Below chapter is a trial of finding correlations between attributes taken out of bloodtests and final outcome (if patient died or survived).
For every further correlation analysis this customized function was applied. Please note the threshold parameter that limits the output to significancy at desired level.
my_correlation <- function (selected_parameters, reference = "outcome", threshold = 0) {
df_cor <- GroupedByPatientDF[ , selected_parameters] %>% mutate_if(is.character, as.factor)%>% mutate_if(is.factor, as.numeric)
corr <- cor(df_cor, use= "pairwise.complete.obs", method = "spearman")
corr[lower.tri(corr,diag=TRUE)] <- NA
corr <- as.data.frame(as.table(corr))%>%na.omit(corr)%>%subset(Var1 == reference)
corr <- subset(corr, abs(Freq) >= threshold)
corr <- corr[order(-abs(corr$Freq)),]
mtx_corr <<- reshape2::acast(corr, Var1~Var2, value.var="Freq")
corrplot(mtx_corr, is.corr=FALSE, tl.col="black", na.label=" ", method = "number", cl.pos = "n", tl.cex = 0.85, number.cex = 0.8)
}
In order to distinguish attributes that may have the biggest impact on outcome a threshold level of 0.65 has been applied.
my_correlation(selected_parameters = 1:78, threshold = 0.65)
table <- as.data.frame(t(mtx_corr)) %>% arrange(outcome)
knitr::kable(table)
| outcome | |
|---|---|
| Lactate.dehydrogenase | -0.8151941 |
| Fibrin.degradation.products | -0.7926060 |
| neutrophils… | -0.7821134 |
| procalcitonin | -0.7783379 |
| High.sensitivity.C.reactive.protein | -0.7711912 |
| Amino.terminal.brain.natriuretic.peptide.precursor.NT.proBNP. | -0.7495679 |
| Hypersensitive.cardiac.troponinI | -0.7492457 |
| D.D.dimer | -0.7466308 |
| neutrophils.count | -0.7126498 |
| Prothrombin.time | -0.6907276 |
| International.standard.ratio | -0.6902048 |
| Interleukin.6 | -0.6681017 |
| eosinophils… | 0.6716401 |
| monocytes… | 0.6733578 |
| calcium | 0.6768312 |
| Prothrombin.activity | 0.6864320 |
| lymphocyte.count | 0.7353506 |
| albumin | 0.7408062 |
| X…lymphocyte | 0.7916369 |
According to the article there are three parameters that show signigficant correlation with the outcome. The further analysis is going to focus on them:
selected_parameters <- c("Lactate.dehydrogenase", "lymphocyte.count", "High.sensitivity.C.reactive.protein")
knitr::kable(summary(GroupedByPatientDF[ , selected_parameters]))
| Lactate.dehydrogenase | lymphocyte.count | High.sensitivity.C.reactive.protein | |
|---|---|---|---|
| Min. : 116.0 | Min. : 0.0250 | Min. : 0.10 | |
| 1st Qu.: 224.0 | 1st Qu.: 0.5463 | 1st Qu.: 10.20 | |
| Median : 306.4 | Median : 0.9137 | Median : 49.00 | |
| Mean : 462.1 | Mean : 1.1067 | Mean : 70.41 | |
| 3rd Qu.: 610.2 | 3rd Qu.: 1.3717 | 3rd Qu.:118.10 | |
| Max. :1867.0 | Max. :43.0550 | Max. :320.00 | |
| NA’s :19 | NA’s :19 | NA’s :22 |
Below analysis shows the correlations between the 3 parameters and likelihood of survival. What is interesting, age itself does not show very strong correlation with outcome (since outbreak of COVID19 higher age has always been treated as a potential threat for patient). Below chart shows also the dominant number of patients between 60-70 in comparison to other age groups:
ggplot(data=GroupedByPatientDF, aes(age)) +
geom_histogram(bins = 10)
selected_parameters <- c("outcome", "Lactate.dehydrogenase", "lymphocyte.count", "High.sensitivity.C.reactive.protein", "age")
my_correlation(selected_parameters)
knitr::kable(mtx_corr)
| Lactate.dehydrogenase | lymphocyte.count | High.sensitivity.C.reactive.protein | age | |
|---|---|---|---|---|
| outcome | -0.8151941 | 0.7353506 | -0.7711912 | -0.5688989 |
legend_colors <- c("died" = "#000000","survived" = "#119826")
ggplotly(
ggplot(GroupedByPatientDF, aes(x=age, y=Lactate.dehydrogenase ,col =outcome)) + geom_point() + scale_color_manual(values = legend_colors) + ggtitle("Lactate.dehydrogenase vs age")
)
The higher the level of Lactate.dehydrogenase, the higher the probability of death, majority of occurencies happened to older patients.
ggplotly(
ggplot(GroupedByPatientDF, aes(x=age, y=lymphocyte.count ,col =outcome)) + geom_point() + scale_color_manual(values = legend_colors) + ylim (0, 5) + ggtitle("lymphocyte.count vs age")
)
The lower the level of lymphocyte.count, the higher the probability of death, majority of occurencies happened to older patients.
ggplotly(
ggplot(GroupedByPatientDF, aes(x=age, y=High.sensitivity.C.reactive.protein ,col =outcome)) + geom_point() + scale_color_manual(values = legend_colors) + ggtitle("High.sensitivity.C.reactive.protein vs age")
)
The higher the level of High.sensitivity.C.reactive.protein, the higher the probability of death, majority of occurencies happened to older patients.
selected_parameters <- c("outcome", "Lactate.dehydrogenase", "lymphocyte.count", "High.sensitivity.C.reactive.protein")
ClassificationDF <- GroupedByPatientDF[ , c(selected_parameters)]
ClassificationDF <- ClassificationDF %>% filter(!is.nan(Lactate.dehydrogenase))
ClassificationDF <- ClassificationDF %>% filter(!is.nan(lymphocyte.count))
ClassificationDF <- ClassificationDF %>% filter(!is.nan(High.sensitivity.C.reactive.protein))
set.seed(23)
inTraining <- createDataPartition(
y = ClassificationDF$outcome,
p = .75,
list = FALSE)
training <- ClassificationDF[ inTraining,]
testing <- ClassificationDF[-inTraining,]
ctrl <- trainControl(
method = "repeatedcv",
number = 2,
repeats = 5)
fit <- train(outcome ~ .,
data = training,
method = "rf",
trControl = ctrl,
ntree = 10)
## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
fit
## Random Forest
##
## 264 samples
## 3 predictor
## 2 classes: 'died', 'survived'
##
## No pre-processing
## Resampling: Cross-Validated (2 fold, repeated 5 times)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## mtry Accuracy Kappa
## 2 0.9378788 0.8748940
## 3 0.9333333 0.8655161
##
## Accuracy was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 2.
rfClasses <- predict(fit, newdata = testing)
confusionMatrix(table(rfClasses, testing$outcome))
## Confusion Matrix and Statistics
##
##
## rfClasses died survived
## died 37 3
## survived 2 45
##
## Accuracy : 0.9425
## 95% CI : (0.871, 0.9811)
## No Information Rate : 0.5517
## P-Value [Acc > NIR] : 4.78e-16
##
## Kappa : 0.8841
##
## Mcnemar's Test P-Value : 1
##
## Sensitivity : 0.9487
## Specificity : 0.9375
## Pos Pred Value : 0.9250
## Neg Pred Value : 0.9574
## Prevalence : 0.4483
## Detection Rate : 0.4253
## Detection Prevalence : 0.4598
## Balanced Accuracy : 0.9431
##
## 'Positive' Class : died
##
rfGrid <- expand.grid(mtry = 10:30)
gridCtrl <- trainControl(
method = "repeatedcv",
summaryFunction = twoClassSummary,
classProbs = TRUE,
number = 2,
repeats = 5)
set.seed(23)
fitTune <- train(outcome ~ .,
data = training,
method = "rf",
metric = "ROC",
preProc = c("center", "scale"),
trControl = gridCtrl,
tuneGrid = rfGrid,
ntree = 30)
fitTune
## Random Forest
##
## 264 samples
## 3 predictor
## 2 classes: 'died', 'survived'
##
## Pre-processing: centered (3), scaled (3)
## Resampling: Cross-Validated (2 fold, repeated 5 times)
## Summary of sample sizes: 132, 132, 132, 132, 132, 132, ...
## Resampling results across tuning parameters:
##
## mtry ROC Sens Spec
## 10 0.9755903 0.9283333 0.9472222
## 11 0.9769329 0.9266667 0.9472222
## 12 0.9743866 0.9316667 0.9361111
## 13 0.9764005 0.9300000 0.9388889
## 14 0.9764468 0.9283333 0.9430556
## 15 0.9742130 0.9266667 0.9416667
## 16 0.9761921 0.9300000 0.9472222
## 17 0.9738310 0.9266667 0.9444444
## 18 0.9770949 0.9283333 0.9416667
## 19 0.9757870 0.9216667 0.9486111
## 20 0.9728704 0.9283333 0.9444444
## 21 0.9764699 0.9316667 0.9444444
## 22 0.9762500 0.9366667 0.9388889
## 23 0.9767824 0.9333333 0.9458333
## 24 0.9760185 0.9316667 0.9388889
## 25 0.9750926 0.9216667 0.9416667
## 26 0.9755787 0.9266667 0.9472222
## 27 0.9770486 0.9266667 0.9472222
## 28 0.9757407 0.9283333 0.9416667
## 29 0.9752662 0.9266667 0.9402778
## 30 0.9760069 0.9216667 0.9444444
##
## ROC was used to select the optimal model using the largest value.
## The final value used for the model was mtry = 18.
ggplot(fitTune) + theme_bw()